智能论文笔记

Modern Machine-Learning Predictive Models for Diagnosing Infectious Diseases

Eman Yahia Alqaissi , Fahd Saleh Alotaibi , Muhammad Sher Ramzan

分类：机器学习 | 人工智能

2022-06-15

控制传染病是一个主要的健康优先事项，因为它们可以传播和感染人类，从而演变为流行病或流行病。因此，早期发现传染病是一种重要需求，许多研究人员已经开发出在早期诊断它们的模型。本文审查了用于传染病诊断的最新机器学习（ML）算法的研究文章。我们从2015年至2022年搜索了科学，ScienceDirect，PubMed，Springer和IEEE数据库，确定了审查的ML模型的优缺点，并讨论了推进该领域研究的可能建议。我们发现大多数文章都使用了小型数据集，其中很少有实时数据。我们的结果表明，合适的ML技术取决于数据集的性质和所需的目标。

translated by 谷歌翻译

Exploring the Use of Data-Driven Approaches for Anomaly Detection in the Internet of Things (IoT) Environment

Eleonora Achiluzzi , Menglu Li , Md Fahd Al Georgy , Rasha Kashef

分类：机器学习

2022-12-31

The Internet of Things (IoT) is a system that connects physical computing devices, sensors, software, and other technologies. Data can be collected, transferred, and exchanged with other devices over the network without requiring human interactions. One challenge the development of IoT faces is the existence of anomaly data in the network. Therefore, research on anomaly detection in the IoT environment has become popular and necessary in recent years. This survey provides an overview to understand the current progress of the different anomaly detection algorithms and how they can be applied in the context of the Internet of Things. In this survey, we categorize the widely used anomaly detection machine learning and deep learning techniques in IoT into three types: clustering-based, classification-based, and deep learning based. For each category, we introduce some state-of-the-art anomaly detection methods and evaluate the advantages and limitations of each technique.

translated by 谷歌翻译

Data Augmentation using Transformers and Similarity Measures for Improving Arabic Text Classification

Dania Refai , Saleh Abo-Soud , Mohammad Abdel-Rahman

分类：自然语言处理 | 人工智能 | 机器学习

2022-12-28

Learning models are highly dependent on data to work effectively, and they give a better performance upon training on big datasets. Massive research exists in the literature to address the dataset adequacy issue. One promising approach for solving dataset adequacy issues is the data augmentation (DA) approach. In DA, the amount of training data instances is increased by making different transformations on the available data instances to generate new correct and representative data instances. DA increases the dataset size and its variability, which enhances the model performance and its prediction accuracy. DA also solves the class imbalance problem in the classification learning techniques. Few studies have recently considered DA in the Arabic language. These studies rely on traditional augmentation approaches, such as paraphrasing by using rules or noising-based techniques. In this paper, we propose a new Arabic DA method that employs the recent powerful modeling technique, namely the AraGPT-2, for the augmentation process. The generated sentences are evaluated in terms of context, semantics, diversity, and novelty using the Euclidean, cosine, Jaccard, and BLEU distances. Finally, the AraBERT transformer is used on sentiment classification tasks to evaluate the classification performance of the augmented Arabic dataset. The experiments were conducted on four sentiment Arabic datasets, namely AraSarcasm, ASTD, ATT, and MOVIE. The selected datasets vary in size, label number, and unbalanced classes. The results show that the proposed methodology enhanced the Arabic sentiment text classification on all datasets with an increase in F1 score by 4% in AraSarcasm, 6% in ASTD, 9% in ATT, and 13% in MOVIE.

translated by 谷歌翻译

Data-driven control of COVID-19 in buildings: a reinforcement-learning approach

Ashkan Haji Hosseinloo , Saleh Nabi , Anette Hosoi , Munther A. Dahleh

分类：人工智能 | 机器学习

2022-12-27

In addition to its public health crisis, COVID-19 pandemic has led to the shutdown and closure of workplaces with an estimated total cost of more than $16 trillion. Given the long hours an average person spends in buildings and indoor environments, this research article proposes data-driven control strategies to design optimal indoor airflow to minimize the exposure of occupants to viral pathogens in built environments. A general control framework is put forward for designing an optimal velocity field and proximal policy optimization, a reinforcement learning algorithm is employed to solve the control problem in a data-driven fashion. The same framework is used for optimal placement of disinfectants to neutralize the viral pathogens as an alternative to the airflow design when the latter is practically infeasible or hard to implement. We show, via simulation experiments, that the control agent learns the optimal policy in both scenarios within a reasonable time. The proposed data-driven control framework in this study will have significant societal and economic benefits by setting the foundation for an improved methodology in designing case-specific infection control guidelines that can be realized by affordable ventilation devices and disinfectants.

translated by 谷歌翻译

Improving the Robustness of Summarization Models by Detecting and Removing Input Noise

Kundan Krishna , Yao Zhao , Jie Ren , Balaji Lakshminarayanan , Jiaming Luo , Mohammad Saleh , Peter J. Liu

分类：自然语言处理 | 机器学习

2022-12-20

The evaluation of abstractive summarization models typically uses test data that is identically distributed as training data. In real-world practice, documents to be summarized may contain input noise caused by text extraction artifacts or data pipeline bugs. The robustness of model performance under distribution shift caused by such noise is relatively under-studied. We present a large empirical study quantifying the sometimes severe loss in performance (up to 12 ROUGE-1 points) from different types of input noise for a range of datasets and model sizes. We then propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any extra training, auxiliary models, or even prior knowledge of the type of noise. Our proposed approach effectively mitigates the loss in performance, recovering a large fraction of the performance drop, sometimes as large as 11 ROUGE-1 points.

translated by 谷歌翻译

Adaptive Uncertainty Distribution in Deep Learning for Unsupervised Underwater Image Enhancement

Alzayat Saleh , Marcus Sheaves , Dean Jerry , Mostafa Rahimi Azghadi

分类：计算机视觉

2022-12-18

One of the main challenges in deep learning-based underwater image enhancement is the limited availability of high-quality training data. Underwater images are difficult to capture and are often of poor quality due to the distortion and loss of colour and contrast in water. This makes it difficult to train supervised deep learning models on large and diverse datasets, which can limit the model's performance. In this paper, we explore an alternative approach to supervised underwater image enhancement. Specifically, we propose a novel unsupervised underwater image enhancement framework that employs a conditional variational autoencoder (cVAE) to train a deep learning model with probabilistic adaptive instance normalization (PAdaIN) and statistically guided multi-colour space stretch that produces realistic underwater images. The resulting framework is composed of a U-Net as a feature extractor and a PAdaIN to encode the uncertainty, which we call UDnet. To improve the visual quality of the images generated by UDnet, we use a statistically guided multi-colour space stretch module that ensures visual consistency with the input image and provides an alternative to training using a ground truth image. The proposed model does not need manual human annotation and can learn with a limited amount of data and achieves state-of-the-art results on underwater images. We evaluated our proposed framework on eight publicly-available datasets. The results show that our proposed framework yields competitive performance compared to other state-of-the-art approaches in quantitative as well as qualitative metrics. Code available at https://github.com/alzayats/UDnet .

translated by 谷歌翻译

Natural Language Processing in Customer Service: A Systematic Review

Malak Mashaabi , Areej Alotaibi , Hala Qudaih , Raghad Alnashwan , Hend Al-Khalifa

分类：自然语言处理 | 人工智能

2022-12-16

Artificial intelligence and natural language processing (NLP) are increasingly being used in customer service to interact with users and answer their questions. The goal of this systematic review is to examine existing research on the use of NLP technology in customer service, including the research domain, applications, datasets used, and evaluation methods. The review also looks at the future direction of the field and any significant limitations. The review covers the time period from 2015 to 2022 and includes papers from five major scientific databases. Chatbots and question-answering systems were found to be used in 10 main fields, with the most common use in general, social networking, and e-commerce areas. Twitter was the second most commonly used dataset, with most research also using their own original datasets. Accuracy, precision, recall, and F1 were the most common evaluation methods. Future work aims to improve the performance and understanding of user behavior and emotions, and address limitations such as the volume, diversity, and quality of datasets. This review includes research on different spoken languages and models and techniques.

translated by 谷歌翻译

RIGA: Rotation-Invariant and Globally-Aware Descriptors for Point Cloud Registration

Hao Yu , Ji Hou , Zheng Qin , Mahdi Saleh , Ivan Shugurov , Kai Wang , Benjamin Busam , Slobodan Ilic

分类：计算机视觉

2022-09-27

成功的点云注册依赖于在强大的描述符上建立的准确对应关系。但是，现有的神经描述符要么利用旋转变化的主链，其性能在较大的旋转下下降，要么编码局部几何形状，而局部几何形状不太明显。为了解决这个问题，我们介绍Riga以学习由设计和全球了解的旋转不变的描述符。从稀疏局部区域的点对特征（PPF）中，旋转不变的局部几何形状被编码为几何描述符。随后，全球对3D结构和几何环境的认识都以旋转不变的方式合并。更具体地说，整个框架的3D结构首先由我们的全球PPF签名表示，从中学到了结构描述符，以帮助几何描述符感知本地区域以外的3D世界。然后将整个场景的几何上下文全局汇总到描述符中。最后，将稀疏区域的描述插值到密集的点描述符，从中提取对应关系进行注册。为了验证我们的方法，我们对对象和场景级数据进行了广泛的实验。在旋转较大的情况下，Riga就模型Net40的相对旋转误差而超过了最先进的方法8 \度，并将特征匹配的回忆提高了3DLOMATCH上的至少5个百分点。

translated by 谷歌翻译

Traffic Accident Risk Forecasting using Contextual Vision Transformers

Khaled Saleh , Artur Grigorev , Adriana-Simona Mihaita

分类：计算机视觉 | 人工智能

2022-09-20

最近，由于其对交通清算的重大影响，交通事故风险预测的问题一直引起了智能运输系统社区的关注。通过使用数据驱动的方法来对空间和时间事件的影响进行建模，因此在文献中通常可以解决此问题，因为它们被证明对于交通事故风险预测问题至关重要。为了实现这一目标，大多数方法构建了不同的体系结构以捕获时空相关性功能，从而使它们对大型交通事故数据集效率低下。因此，在这项工作中，我们提出了一个新颖的统一框架，即是上下文视觉变压器，可以通过端到端的方法进行培训，该方法可以有效地建议问题的空间和时间方面，同时提供准确的交通事故。风险预测。我们评估并比较了我们提出的方法的性能与来自两个不同地理位置的两个大规模交通事故数据集的文献的基线方法。结果表明，与文献中先前的最新作品（SOTA）相比，RMSE得分的重大改善大约为2 \％。此外，我们提出的方法在两个数据集上优于SOTA技术，而仅需要少23倍的计算要求。

translated by 谷歌翻译

LINGUIST: Language Model Instruction Tuning to Generate Annotated Utterances for Intent Classification and Slot Tagging

Andy Rosenbaum , Saleh Soltan , Wael Hamza , Yannick Versley , Markus Boese

分类：自然语言处理 | 人工智能 | 机器学习

2022-09-20

我们提出语言学家，这是一种通过微调Alexatm 5B生成带注释数据的方法，用于生成意图分类和插槽标记（IC+ST），这是一种5亿参数的多语言序列到序列（SEQ2SEQ）模型，在灵活的指令上迅速的。在SNIP数据集的10次新颖意图设置中，语言学家超过了最新的方法（反向翻译和示例外推），可以通过宽阔的边距，显示出IC回忆中+1.9点的目标意图的绝对改善ST F1分数和+2.5分。在MATIS ++数据集的零击跨语言设置中，语言学家表现出强大的机器翻译基线，插槽对齐的基线是+4.14的+4.14点在6个语言上绝对在ST F1分数上，同时在IC上匹配IC的性能。最后，我们在用于对话代理IC+ST的内部大规模多语言数据集上验证了我们的结果，并显示了使用背面翻译，释义和插槽目录重新采样采样的基线的显着改进。据我们所知，我们是第一个展示大规模SEQ2SEQ模型的指导微调的人，以控制多语言意图和插槽标记的数据生成的输出。

translated by 谷歌翻译